Estimating Dominance Norms of Multiple Data Streams

نویسندگان

  • Graham Cormode
  • S. Muthukrishnan
چکیده

There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i’s correspond to the domain, j’s index the different signals and ai,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as i maxj{ai,j}. It may be thought as estimating the norm of the “upper envelope” of the multiple signals, or alternatively, as estimating the norm of the “marginal” distribution of tabular data streams. It is used in applications to estimate the “worst case influence” of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b — min-dominance ( i minj{ai,j}), countdominance (|{i|ai > bi}|) or relative-dominance ( i ai/max{1, bi} ) — are all impossible to estimate accurately with sublinear space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving the Paradox of Multiple IRR\'s in Engineering Economic Problems by Choosing an Optimal -cut

Until now single values of IRR are traditionally used to estimate the time value of cash flows. Since uncertainty exists in estimating cost data, the resulting decision may not be reliable. The most commonly cited drawbacks to using the internal rate of return in evaluatton of deterministic cash flow streams is the possibility of multiple conflicting internal rates of return. In this paper we p...

متن کامل

Comparing Data Streams Using Hamming Norms (How to Zero In)

Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases, and instead must be processed “on the fly” as they are produced. Similarly, sensor networks produce multiple data streams of observations from their sensors. There is growing focus on ma...

متن کامل

Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals

Let f : {1, 2, . . . , N} → [0,∞) be a non–negative signal, defined over a very large domain and suppose that we want to be able to address approximate aggregate queries or point queries about f . To answer queries about f , we introduce a new type of random sketches calledmax–stable sketches. The (ideal precision) max–stable sketch of f , Ej(f), 1 ≤ j ≤ K, is defined as: Ej(f) := max 1≤i≤N f(i...

متن کامل

Priority Setting Meets Multiple Streams: A Match to Be Further Examined?; Comment on “Introducing New Priority Setting and Resource Allocation Processes in a Canadian Healthcare Organization: A Case Study Analysis Informed by Multiple Streams Theory”

With demand for health services continuing to grow as populations age and new technologies emerge to meet health needs, healthcare policy-makers are under constant pressure to set priorities, ie, to make choices about the health services that can and cannot be funded within available resources. In a recent paper, Smith et al apply an influential policy studies framework – Kingdon’s multiple str...

متن کامل

Selectivity Estimation over Multiple Data Streams using Micro-clustering

Selectivity estimation is an important task for query optimization. We propose a technique to perform range query estimation over multiple data streams using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters and cosine series for all streams. These microclusters maintain data distribution information about the stream values using cosine coefficients. These ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003